Towards Distribution-Free Multi-Armed Bandits with Combinatorial Strategies

نویسندگان

Xiang-yang Li

Shaojie Tang

Yaqin Zhou

چکیده

We consider the following linearly combinatorial multiarmed bandits (MABs) problem. In a discrete time system, there are K unknown random variables (RVs), i.e., arms, each evolving as an i.i.d stochastic process over time. At each time slot, we select a set of N (N ≤ K) RVs, i.e., strategy, subject to an arbitrarily constraint. We then gain a reward that is a linear combination of observations on selected RVs. Our goal is to minimize the regret, defined as the difference between the summed reward obtained by an optimal static policy that knew the mean of each RV, and that obtained by a specified learning policy that does not know. A prior result for this problem has achieved zero regret (the expect of regret over time approaches zero when time goes to infinity), but dependent on probability distribution of strategies generated by the learning policy. The regret becomes arbitrarily large if the difference between the reward of the best and second best strategy approaches zero. Meanwhile, when there are exponential number of combinations, naive extension of a prior distribution-free policy would cause poor performance in terms of regret, computation and space complexity. We propose an efficient Distribution-Free Learning (DFL) policy that achieves zero regret without dependence on probability distribution of strategies. Our learning policy only requires time and space complexity O(K). When the linear combination is involved with NP-hard problems, our policy provides a flexible scheme to choose possible approximation algorithms to solve the problem efficiently while retaining zero regret.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi-Bandits with Knapsacks

We unify two prominent lines of work on multi-armed bandits: bandits with knapsacks and combinatorial semi-bandits. The former concerns limited “resources” consumed by the algorithm, e.g., limited supply in dynamic pricing. The latter allows a huge number of actions but assumes combinatorial structure and additional feedback to make the problem tractable. We define a common generalization, supp...

متن کامل

Combinatorial Multi-armed Bandits for Real-Time Strategy Games

Games with large branching factors pose a significant challenge for game tree search algorithms. In this paper, we address this problem with a sampling strategy for Monte Carlo Tree Search (MCTS) algorithms called näıve sampling, based on a variant of the Multiarmed Bandit problem called Combinatorial Multi-armed Bandits (CMAB). We analyze the theoretical properties of several variants of näıve...

متن کامل

Anytime optimal algorithms in stochastic multi-armed bandits

We introduce an anytime algorithm for stochastic multi-armed bandit with optimal distribution free and distribution dependent bounds (for a specific family of parameters). The performances of this algorithm (as well as another one motivated by the conjectured optimal bound) are evaluated empirically. A similar analysis is provided with full information, to serve as a benchmark.

متن کامل

Online Multi-Armed Bandit

We introduce a novel variant of the multi-armed bandit problem, in which bandits are streamed one at a time to the player, and at each point, the player can either choose to pull the current bandit or move on to the next bandit. Once a player has moved on from a bandit, they may never visit it again, which is a crucial difference between our problem and classic multi-armed bandit problems. In t...

متن کامل

Schemata Bandits for Binary Encoded Combinatorial Optimisation Problems

We introduce the schema bandits algorithm to solve binary combinatorial optimisation problems, like the trap functions and NK landscape, where potential solutions are represented as bit strings. Schema bandits are influenced by two different areas in machine learning, evolutionary computation and multiarmed bandits. The schemata from the schema theorem for genetic algorithms are structured as h...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Towards Distribution-Free Multi-Armed Bandits with Combinatorial Strategies

نویسندگان

چکیده

منابع مشابه

Semi-Bandits with Knapsacks

Combinatorial Multi-armed Bandits for Real-Time Strategy Games

Anytime optimal algorithms in stochastic multi-armed bandits

Online Multi-Armed Bandit

Schemata Bandits for Binary Encoded Combinatorial Optimisation Problems

عنوان ژورنال:

اشتراک گذاری